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A parameter estimation problem is considered, in which dispersed 
sensors transmit to the statistician partial information regarding their 
observations. The sensors observe the paths of continuous semimartin- 
gales, whose drifts are linear with respect to a common parameter. 
A novel estimating scheme is suggested, according to which each sen- 
sor transmits only one-bit messages at stopping times of its local 
filtration. The proposed estimator is shown to be consistent and, for 
a large class of processes, asymptotically optimal, in the sense that 
its asymptotic distribution is the same as the exact distribution of 
the optimal estimator that has full access to the sensor observations. 
These properties are established under an asymptotically low rate of 
communication between the sensors and the statistician. Thus, de- 
spite being asymptotically efficient, the proposed estimator requires 
minimal transmission activity, which is a desirable property in many 
applications. Finally, the case of discrete sampling at the sensors is 
studied when their underlying processes are independent Brownian 
motions. 

1. Introduction. Consider a number of dispersed sensors, each one of 
which observes the path of a real-valued stochastic process. The joint distri- 
bution of these processes is assumed to belong to some parametric family. 
The goal is to estimate the unknown parameter at a central location {fusion 
center) that receives information from all sensors. 

When the sensors transmit their complete observations to the fusion cen- 
ter, we have a classical (centralized) parameter estimation problem. However, 
the fusion center often does not have full access to the sensor observations 
due to practical considerations, such as limited communication bandwidth. 
These communication constraints are present in applications such as mobile 
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and wireless communication, data fusion, environmental monitoring and dis- 
tributed surveillance, in which it is crucial to minimize the congestion in the 
network and the computational burden at the fusion center (see, e.g., Foresti 
et al. [6]). 

Under this setup, which is often called decentralized, each sensor needs 
to transmit a small number of bits per communication to the fusion cen- 
ter and it is clear that the classical (centralized) statistical techniques are 
no longer applicable. As a result, there has been a great interest in decen- 
tralized formulations of statistical problems (see, e.g., the review papers by 
Viswanathan and Varshney [24], Blum et al. [1], Han and Amari [10] and 
Veeravalh [23]). 

Parameter estimation under a decentralized setup has been studied ex- 
tensively using information-theoretic techniques. More specifically, it is of- 
ten assumed that there are two correlated sensors, each of which observes 
a sequence of independent and identically distributed (i.i.d.), finite-valued 
random variables whose joint probability mass function is determined by the 
unknown parameter. The sensors are then required to transmit to the fusion 
center messages that belong to alphabets of smaller size than those of the 
original observations. The review paper by Han and Amari [9] describes in 
detail the main advances in this line of research. On the other hand, Luo [15] 
and Xiao and Luo [25] considered an arbitrary number of independent sen- 
sors that take i.i.d. observations with a common mean, which is the unknown 
parameter. Assuming that the parameter space and the support of the noise 
distribution are both compact intervals, they constructed decentralized esti- 
mating schemes that require the transmission of a small number of bits per 
communication. 

In all the above papers, the sensors collect i.i.d. observations at a sequence 
of discrete times and transmit a small number of bits to the fusion center 
at every such sampling time. Moreover, even under an asymptotically large 
horizon of observations, the resulting estimators have larger mean square 
errors than the corresponding optimal centralized estimators, which have 
full access to the sensor observations. 

In this paper the goal is to construct a decentralized estimating scheme 
that requires minimal communication activity from the sensors and achieves 
asymptotically the mean square error of the optimal centralized estimator, 
under a general statistical model for the sensor observations. In particular, 
we assume that the sensors observe the paths of continuous semimartingales 
whose drifts are linear with respect to the unknown parameter. 

The centralized version of this problem is well understood. For Gaussian 
processes with independent increments, the fixed-horizon maximum like- 
lihood estimator (MLE) was studied by Grenander [8] and Striebel [22]. 
Brown and Hewitt [2] proved that the MLE is consistent and asymptotically 
normal for stationary and ergodic time-homogeneous diffusions. Feigin [4] 
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established the same properties for more general diffusions, assuming that 
the score process is a martingale. Liptser and Shiryaev ([13], pages 225-236) 
studied the MLE for a diffusion-type process and computed its bias and vari- 
ance in the Ornstein-Uhlenbeck case. For a diffusion-type process with lin- 
ear drift with respect to the unknown parameter, Liptser and Shiryaev [13], 
pages 244-248, and earlier Novikov [18], suggested a sequential version of 
the MLE and proved that it is unbiased and that it attains a prescribed 
accuracy. In the particular case of a square root diffusion, Brown and He- 
witt [3] suggested an alternative sequential estimator with similar optimality 
properties. Melnikov and Novikov [17] and Galtchouk and Konev [7] studied 
least-squares sequential estimators that attain a prescribed accuracy in a 
multidimensional semimartingale regression model, generalizing in this way 
the results of Novikov [18]. We refer to Kutoyants [12] and Rao [19] for ex- 
haustive references in the statistical inference of diffusion and diffusion- type 
processes. 

Apart from the statistical model for the sensor observations, our work dif- 
fers from previous approaches in some other important aspects as well. First 
of all, we do not assume that the frequency with which a sensor transmits its 
messages to the fusion center {communication rate) is the same as the fre- 
quency with which it collects its local observations {sampling rate). Instead, 
we assume that the sensors observe their underlying processes continuously, 
but communicate with the fusion center at discrete times. Therefore, in our 
context, the incurred loss of information is not only due to the quantization 
of sensor observations, but also due to the discrete transmission of messages 
to the fusion center in comparison to the continuous flow of information at 
the sensors. 

Moreover, we do not require that the sensors communicate with the fusion 
center at deterministic and equidistant times. Instead, we allow each sensor 
to transmit its messages to the fusion center at random times that are trig- 
gered by its local observations. In particular, we propose a communication 
scheme according to which the sensors transmit only one-bit messages at first 
exit times of appropriate, locally-observed statistics (see Rabi et al. [20] and 
Fellouris and Moustakides [5] for similar communication schemes in different 
decentralized problems). Based on this communication scheme, we construct 
an estimator that is always consistent, even when the sensor processes are 
dependent. 

However, the main result of this paper is that, in certain cases, the asymp- 
totic distribution of the proposed estimator is the same as the exact distri- 
bution of the corresponding optimal centralized estimator. In particular, this 
holds when the sensor processes are arbitrary, orthogonal continuous semi- 
martingales, as well as when they are correlated Gaussian processes with 
independent increments. 

More importantly, these asymptotic properties are established as the hori- 
zon of observations goes to infinity and as the rate of communication between 
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sensors and the fusion center goes to zero. Thus, although the proposed es- 
timator is statistically efficient, it requires minimal communication activity 
from the sensors, which is a very desirable property in applications with 
severe communication constraints. 

Finally, we consider in more detail the special case in which the sensors 
observe independent Brownian motions, since the tractability of this model 
allows us to obtain additional insight regarding the suggested estimating 
scheme. In this context, we also consider the case of discrete sampling, where 
the sensors do not observe their underlying processes continuously, but at a 
sequence of discrete times. It is shown that the proposed estimator remains 
consistent for any fixed sampling frequency, as long as the sensors have an 
asymptotically low rate of communication with the fusion center. However, 
asymptotic optimality does require a sufficiently high sampling rate, which 
we determine as a function of the communication rate and the observation 
horizon. 

The rest of the paper is organized as follows: in Section 2 we formulate 
the problem under consideration. In Section 3 we specify the proposed esti- 
mating scheme and analyze its asymptotic properties. In Section 4 we focus 
on the special case that the sensors observe independent Brownian motions. 
We conclude in Section 5. 

2. Problem formulation. In what follows, we denote by i the generic 
sensor, where i = We assume that sensor i observes the path of 

a continuous stochastic process Y"^ = {Y^}t>Q and is able to compute any 
statistic that is adapted to the filtration generated by y*. 

In this section we specify the dynamics of {Y^ , . . . ,Y^) under a family 
of probability measures {Pa, A G M}, we review standard results regarding 
the centralized estimation of the unknown parameter A and we define the 
notion of an (asymptotically optimal) decentralized estimator. 

2.1. Statistical model. Let {Y^ , . . . ,Y^) be the coordinate process on 
the canonical space of continuous functions {Vl,J-), where Q. :=C[0,oo)^ 
and F := B{il) is the associated Borel cr-algebra. We denote by {T^} the 
right-continuous version of the natural filtration generated by Y^ and by 
{J-t} the corresponding global filtration 

(2.1) -^:=Ci+, Ci:=a{Y:;0<s<t), 

(2.2) J't-=Ct+, Ct:=a{Y:;0<s<t,l<i<K). 
Let also Pq be a probability measure on (il, -F) so that 

Y'gMo yi<i<K, 
where A^o is the class of continuous Pg-local martingales that start from 0. 
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For every 1 < i,j < K , we denote by {Y^,Y^) the quadratic covariation 
of and Y^ and we assume that X* is an {J^^^j-progressively measurable 
process so that 

(2.3) Po ^ \X'fd{Y\Y')^ < oo j = 1 VO < t < oo. 
Then, we can define the stochastic integral 

(2.4) Bt:=J2 ^l^s, t>0, 

i=i -^0 

and we denote by A its quadratic variation, that is, 

K K t 

(2.5) At:={B,B)t = Y,Y. K^i <i{Y\Y^) ^, t>0. 

i=l j=i ■^0 

Moreover, we assume that the Novikov-type condition: 
(Al) Eo[e(^'/2)^'] <oo VO<t<oo 

is satisfied for every A 7^ 0, which allows us to define for every A 7^ the 
probability measure P^ in the following way: 

(2.6) 



dPn 



^^gAi?.-(AV2M, V0<t<OO. 



Then, if we denote hy Mx the class of continuous P;^-local martingales that 
start from 0, Girsanov's theorem (see [21], page 331) implies that 

(2.7) ■.= Y' - {Y\XB) €Mx yi = l,...,K 

and, consequently, {N^,N^) = {Y^,Y^) for every i ^ j. Therefore, from (2.4) 
and (2.7) it follows that under Pa 

(2.8) yi = ^y, Xld{Y\Y^)^ + Nl t>0,l<i<K. 

2.2. The parameter estimation problem. The goal is to estimate the un- 
known parameter A using the information that is being transmitted from the 
sensors to the fusion center. The flow of this information can be described 
by a sub-filtration of {J-t} and is determined by the communication scheme 
that is chosen by the statistician. 

Let {Gt} C {J^t} be the fusion center filtration. We will say that: 

(a) {(j)t)t>o is a fixed-horizon, {t/t}-adapted estimator of A, if (pt is a Gt- 
measurable statistic for every t> 0. 
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(b) (r^,0^)^>o is a sequential, {^t}-adapted estimator of A, if (T^)7>o 
is an increasing family of {^t}-stopping times and (/)-y a -measurable 
statistic for every 7 > 0. 

We will say that a {t/t}-adapted estimator, either fixed-horizon or sequential, 
is decentralized, when the fusion center filtration {Gt} is of the form 

(2.9) gt = a{alxiWn<t,i = l,...,K), t>0, 

where {al^)n£N is an increasing sequence of {J"^}-stopping times and each 
Xn is an -measurable statistic that takes values in a finite set. In other 
words, a decentralized estimator must rely on quantized versions of the sen- 
sor observations, which may be transmitted to the fusion center at stopping 
times of the local sensor filtrations. 

If the fusion center learns the complete sensor observations at any time t, 
then it can construct {J-i}-adapted estimators, which we will call centralized. 
Assuming that for every A S M, 

(A2) Px{At>0) = l yt>0, 

the centralized, fixed- horizon MLE of A at some time t > is 

(2.10) 

that is, the maximizer of the corresponding log-likelihood function. 



d P 

(2.11) ^t(A):=lor ^ 



dPn 



\2 

= \Bt - —At. 



From (2.11) we also obtain the corresponding score process and (observed) 
Fisher information, that is, 

and, consequently, we have 

Mi 

(2.13) At = A + -^, i>0. 

At 

Moreover, from (2.4), (2.5) and (2.8) it follows that M ^M.\, since 

K ,.t 

(2.14) Mt = y2 XidNi, t>0. 

^=l 

Since (M, M) = {B, B) = A, if we also assume that for every A G M 
(A3) lim ylt = 00) = 1, 
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then there exists a PA-Brownian motion W (see [11], page 174) so that 

(2.15) Px{Mt = WA,,t> 0) = 1. 

This representation has some important consequences, which we state in the 
following lemma. 

Lemma 2.1. (a) If {t^)^yQ is an increasing family of (possibly random) 
times so that t-y — )• oo Pa-^.s., then — )• A Px-a.s. as ^ ^ oo. 

(b) IfTi < T2 are {Tt}-stopping times so that ^\\At2\ < 00, then 

(2.16) Ex[Mt,] = Ex[Mt,]=0, 

(2.17) Ea [(Mt, - Mt, f] = Ex [At, -At,]. 

(c) If {At} is deterministic, then 

(2.18) ^/A't{Xt- X)r^N' (0,1) Vt>0. 

Proof. Part (a) is a consequence of (2.13), (2.15) and the strong law of 
large numbers for the Brownian motion. Part (b) follows from a localization 
argument, optional sampling theorem and Doob's maximal inequality. Fi- 
nally, when {At} is deterministic, from (2.15) it follows that Mt ^ M{0,At) 
for every t > 0. From this observation and (2.13) we obtain (2.18). □ 

In the following lemma we state a version of the Cramer-Rao- Wolfowitz 
inequality. 

Lemma 2.2. If T is an {J^t}- stopping time and (p is an J^t -i^^o-surahle 
statistic so that < Ea[^t] < 00 and Ea[(^] = A, Va [(;/>] < 00 for every A G M, 
then 

VaM> ^ 



Ea[A 



Proof. From (2.16) and (2.17) and the Cauchy-Schwarz inequality we 
have 



Ea [</)Mt] = ExU - A)Mt] < ^Ea[(0- A)2]Ea[(Mt)2] = ^Va [0] Ea [At] . 

Thus, it suffices to show that EA[(/>-/VfT] = 1. Indeed, changing the measure 
Pa I—)- Pq and differentiating both sides in Ex[4>\ = A with respect to A, 

1 = ^Eo[e^^^-(^'/2)A^^] ^ Eo[e^^^^(^'/2)A^^^^] ^ ExWh^. 
dA 

The second equality follows from interchanging derivative and expectation, 
which is possible due to the (quadratic) form of the log-likelihood function 
(2.11) (see, e.g., [12], page 54). □ 

Lemma 2.2 and (2.18) imply that when At is deterministic, A^ is an op- 
timal estimator of A, in the sense that it has the smallest possible variance 



8 



G. FELLOURIS 



among J-j -measurable, unbiased estimators (for any fixed t > 0). In order to 
obtain such an exact optimality property when {At} is random, we consider 
the following sequential version of the centralized MLE: 

(2.19) S^:=mf{t>0:At>j}, = T^] , 7 > 0. 

Lemma 2.3. For every 7 > 0, 

(2.20) Pa(5^<oo) = 1, 

(2.21) V7(A57-A)~AA(0,1). 
Moreover, Px{Xs^ — A) = 1 as 7 00. 

Proof. Assumption (A3) implies (2.20). Since A has continuous paths, 
=7. Thus, from (2.15) we have M^^ ~AA(0,7) and, consequently, from 

(2.18) we obtain (2.21). Finally, the strong consistency of A^^ as 7 —t- 00 is 
implied by Lemma 2.1(a). □ 

From Lemmas 2.2 and 2.3 it follows that, for any given 7 > 0, A^^ is an 
optimal estimator of A, in the sense that it has the smallest possible variance 
among unbiased, {J^f}-adapted estimators (T^,(^-y) for which Ea[^t7] < 7. 

Therefore, there is always a centralized estimator of A that is unbiased, 
normally distributed and optimal in a nonasymptotic sense. A decentralized 
estimator cannot have such a strong optimality property, as it relies on less 
information. However, we will say that a (decentralized) estimator is asymp- 
totically optimal, if it has the same distribution as the corresponding optimal 
centralized estimator when an asymptotically large horizon of observations 
is available. More specifically, 

(a) when {At} is deterministic, a fixed-horizon, {C/t}-adapted estimator 
{4't)t>o will be asymptotically optimal if 

V^(0t - A) ^7\A(0,1) ast-^oo, 

(b) when {At} is random, a sequential, {^(}-adapted estimator {T^, (p-y)-y>o 
will be asymptotically optimal if 

limsup(EA[^TT,] — 7) < and — X) ^ Af{0,l) as7— J-oo. 

7— >-oo 

2.3. Notation. We close this section with some notation that will be 
useful in the construction and analysis of the proposed estimating scheme. 
Thus, for every 1 < i < K we define the statistic 

(2.22) Bi:= f XidYi, t>0, 

Jo 
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and for any 1 <i,j < K we denote by A^^ the quadratic covariation of 
and and by the quadratic variation of B^, that is, 

(2.23) A^ := {B\B^), = f XlX^ d{Y\Y^) ^, t > 0, 

Jo 

(2.24) Al:={B\B'),= [\xlf d{Y\Y') ^, t > 0. 

Jo 

Then, recalhng the definitions of B and A in (2.4) and (2.5), we have 

K K 

(2.25) B = Y,B\ A = Y,^'+ Yl 

i=l i=l l<i7^j<K 

Moreover, we define the set 

(2.26) V:={{i,j)\l<iy^j <K and A'^ is random} 
and we have the foUowing representation for A: 

K 

(2.27) A = Y,^'+ Yl ^'■' + Yl 

i=i {i,j)ev {i^mv 

3. A decentralized estimating scheme. In this section we construct and 
analyze the proposed decentralized estimator. More specifically, we first de- 
fine the communication scheme at the sensors and then introduce the statis- 
tics and estimators that will be used by the fusion center. As in the central- 
ized setup, we distinguish two cases and consider a fixed-horizon estimator 
when {At} is deterministic and a sequential estimator when {At} is random. 
In each case, we analyze the asymptotic behavior of the resulting estimator 
as the horizon of observations goes to infinity and the rate of communication 
goes to zero, assuming that conditions (Al), (A2), (A3) are satisfied. 

The main idea in the suggested communication scheme is that each sensor 
should inform the fusion center about the sufficient statistics for A that it ob- 
serves locally. However, instead of communicating at deterministic times, its 
communication times should be triggered by its local observations. In other 
words, each sensor i should inform the fusion center about the evolution of 
the {J^^*}-adapted, sufficient statistics for A at a sequence of {J^j-stopping 
times. 

When A is deterministic, i?^, . . . , B^ are the only sufficient statistics for 
A and it is clear that each B^ is {J^j'^-dapted, thus observable at sensor i. 

When A is random, there are additional sufficient statistics, the random 
processes of the form A^ or A^^ (when A^ or A^^ is deterministic, it is com- 
pletely known to the fusion center at any time t). If A^ is random, it is clear 
that it is {J-jl-a-dapted, since = A^ . On the other hand, if A^^ (with 

i 7^ j) is random, it is not locally observed either at sensor i or at sensor j, 
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thus, the fusion center cannot be informed about its evolution (since there 
is no communication between sensors). 

3.1. Communication scheme and fusion center statistics. Based on the 
previous discussion, we suggest that each sensor i communicate with the 
fusion center at the times 

(3.1) r;'«:=inf{t>r:'^i:i?,*-i?%., ^ (-A\ A*)}, n € N, 

and, if A and A^ are random, also at the times 

(3.2) T'^'^:=M{t>Ti;^^:Al-A\.^ >d}, n € N, 

where Tq^ = Tq^ := and c*. A*, A* > are arbitrary, constant thresholds, 
chosen by the designer of the scheme, known both at sensor i and the fusion 
center. If either A^ or A is deterministic, sensor i does not communicate at 
the times (rn^) and we set Tn^ = oo for every n > 1. 

At , sensor i transmits to the fusion center with one bit the outcome 
of the Bernoulli random variable 

1, ifB\ „-B\,B >A\ 



(3.3) zl, 



0, if - <-A\ 



whereas at Tn^, if needed, it informs the fusion center with one bit that A^ 
has increased by c* since t*L^. Therefore, the induced filtration at the fusion 
center is 

(3.4) Ji:=a(7^^,T^^,4|T^^<t,T^^<t,i = l,...,K), t > 0, 

which means that the fusion center can compute any {J-jj-adapted statistic. 
For every 1 <i < K we define 

(3.5) i^=nc\ T^^<t<T'4^, 7^GNU{0}, 

n 

(3.6) 5i:=5][A^4-A*(l-4)], T^^<t<T^S, n>l, 

i=i 

where Bl := for t < t^{^ , with the understanding that A^ := A^ when A* is 
deterministic. Moreover, motivated by (2.25)-(2.27), we define 

K 



(3.7) B:=Y,B\ 



i=l 

K K 



i: = ^i* + ^d,i*+ A^^ 

i=l i=l {i,MV 
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(3.8) 

=^(i+(i.)i^+ 

i=i (i,MV 
where di is the number of random terms of the form A^^ , that is, 

(3.9) di := 1 j <K and A'^ is random}. 

Again, we set A:= A when A is deterministic. Finahy, we define the fohowing 
quantities: 

K K 

(3.10) A:=^max{A^A^}, c:=^(l + di)c^ 

i=l i=l 

which will play an important role in the asymptotic analysis of the proposed 
estimating scheme. 

Lemma 3.1. For every l<i< K and t,c, A > 0, 

(3.11) 0<Ai-Ai<c\ \Bi- Bi\<max{K\A'}, 

(3.12) At-At<c, \Bt-Bt\<A. 

Proof. If A'-, A are deterministic, then Jl* := A\ A:= A and the cor- 
responding inequalities hold trivially. Thus, without loss of generality, we 
assume that both A^ and A are random. 

First of all, we observe that -B* is exactly equal to B^ at Tn and A^ is 
exactly equal to A^ at Tn"^ for every n G N. Indeed, due to the path continuity 



of A^ and B^ , for every n G N it is 

n 

A\^A = = y"\A\^A - A\^a] = A 



i, A ? 

J = l 



n 



s;,, = j][A^4 - A^(i - 4)] = - ] = K^^B . 

Moreover, from the definition of the communication times (Tn^)n, it is clear 
that \Bl — Bl\< max{A*, A*} for any time t between two jump times of i?*, 
which proves the second inequality in (3.11). Similarly, from the definition 
of ("^n )n and the fact that A* has increasing paths, it is clear that < 
A\ — A\< c* for any time t between two jump times of A*, which proves the 
first inequality in (3.11). 

The second inequality in (3.12) follows directly from the second inequal- 
ity in (3.11) and the definition of A. Finally, from the Kunita-Watanabe 
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inequality (see [11], page 142) and the algebraic inequality 2-\J\xy\ < \x\ + 
\y\ we have 

\A'^\ < VA'AJ < 1{A^ + A^), l<i^j<K, 
thus, from the definitions of D and dj [recall (2.26) and (3.9)] we obtain 

1 ^ 
From the representation of A in (2.27) and the latter inequality we have 

K K 

A<"YA' + "YdiA' + A'^ 

i=l 4=1 
K 

< J^(l + di)(i^ + c^)+ Y A''=^ + c, 

i=i {i,j)iv 

where the second inequality is due to (3.11) and the equality follows from 
the definitions of A and c in (3.8) and (3.10), respectively. □ 

3.2. The proposed estimator. The proposed communication scheme re- 
quires the transmission of only one bit whenever a sensor communicates 
with the fusion center. Thus, the overall communication activity in the net- 
work will be low as long as the communication rate of each sensor is low. 
Therefore, we should ideally design an {J^j}-adapted estimator that is sta- 
tistically efficient even under an asymptotically low communication rate as 
the horizon of observations goes to infinity. For this reason, we let A — ?• oo 
and c — )• cx) as t — )• oo (or 7 — )• oo) and we determine the relative rates that 
guarantee consistency and asymptotic optimality. 

When {At} is deterministic, we suggest the following estimator of A at 
some arbitrary, deterministic time t > 0: 

(3.13) 

In the following theorem, which is the first main result of this paper, we show 
that {\t} is consistent and asymptotically optimal under an asymptotically 
low communication rate. 

Theorem 3.1. //i, A— )-oo so that A. = o{At), then At converges to A 
almost surely and in mean square. If additionally A = o{^/At), then \t is 
asymptotically optimal, that is, ^/At{\t — A) — 7'AA(0, 1). 

Proof. Since At converges to A almost surely and in mean square as 
t —7- 00, in order to prove that Af is consistent, it suffices to show that PA(|Af — 
Atl -;>0) = 1 and EA[(At - At)^] as t,A-;>oo so that A = o(At). 
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Moreover, since y/Al{Xt — A) ~ AA(0, 1) for any t > 0, in order to estab- 
lish the asymptotic optimality of A^, it suffices to show that \/^\Xt — Xt\ 
converges to in probability as t, A — )• oo so that A = o{-^/At). 

Indeed, from the second inequality in (3.12) we have 



\Xt-Xt\ 



At At 
which proves both claims. □ 



\Bt-Bt\ A 

' < — , t >0, 



At - At 



When {At} is random, we suggest the fohowing sequential, {J^fj-adapted 
estimator of A: 

(3.14) S^:=mf{t>0:At>-f-c}, A^ := ( ^ ) , 7 > c. 



A 



Lemma 3.2. For any 7, c such that 7 > c, 

(3.15) Pa(cSt,<cS^<oo) = 1, 

(3.16) Px{A^^<7) = l, 

(3.17) Ex[{M^/]<j. 
Moreover, z/c, 7—7-00 so that c = o{'~f), then 

(3.18) limsup(EA[^5 ] - 7) < 0. 

Proof. From the first inequality in (3.12) we have A> A — c, therefore, 

(3.19) Sj<mf{t>0:At-c>-f -c} = S^. 

From this inequality and (2.20) we obtain (3.15). Moreover, since A is the 
quadratic variation of B, it has continuous and increasing paths, thus, from 
(3.15) we obtain Px{A^^ < As^ =7) = l- Finally, from (2.17) and (3.16) we 
obtain 

Ea[(M^^)'] = Ea[A^J<7, 
which proves (3.17) and implies (3.18). □ 

In the following theorem we show that A^ is a consistent estimator of A, 
even under an asymptotically low communication rate. 

Theorem 3.2. PxiX^ ^ A) = 1 and Ea[(Aj - A)^] ^ as 7, c, A —>• 00 
so that c, A = 0(7). 
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Proof. Recalling from (2.12) that B = XA + M, we have PA-a.s. 



AJs, V A )s, \A/s. 
B-B\ rA\ fM 



A )s. \A)s^ \A 



and, consequently, 



,3.20) .,^_..(_^) ,.(_^) ,y 



From the definition of S-y it follows that Ag > 7 — c, whereas from (3.12) 
we have \B — B\g < A and (A — A)^ <c. Therefore, 

A + lAlc l^s I 
(3.21) -A|< ' 

' ' 7 - c 7 - c 

The first term in the right-hand side clearly goes to as c. A, 7 — )• 00 so that 
c, A = o(7). Moreover, from (2.15) and (3.16) we have Pa-s.s. 

\Ms I \Wa, I \Wa, I ^ 

(3 



7-c 7-c 7-c 

If c, 7 — )• 00 so that c = 0(7), Pa (^5 00) = 1, due to assumption (A3). 
Therefore, the strong law of large numbers implies that the right-hand side 
in (3.22) converges to and, consequently, Pa(-^5 — )• A) = 1 as c, 7 — )• 00 so 
that c = 0(7). 

Moreover, if we square both sides in (3.21), apply the algebraic inequality 
{x + y)^ < 2(x^ + y^), take expectations and use (3.17), we obtain 

E.lfe-A)=l<2f^)%2, ^ 



7 — c J (7 — c 



which implies that Ea[(A^ — A)^] — t- as c, A,7 — t- 00 so that c, A = 0(7). 
□ 

The consistency of A^ was established without any additional conditions 
on the dynamics of the sensor processes. However, it is clear that the sug- 
gested estimator cannot be asymptotically efficient in such a general setup, 
since it does not have any access to sufficient statistics of the form A^^ with 

{i,j)ev. 

Nevertheless, if every A^^ with i ^ j is deterministic, then I? = and 
the fusion center has access to all sufficient statistics for A. In this case, 
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we can obtain an asymptotically sharp lower bound for , the observed 
Fisher information that is utilized by the proposed estimator, which allows 
us to establish its asymptotic optimality even under an asymptotically low 
communication rate. 



Lemma 3.3. IfT> = 0, then At < At for every t > 0. Consequently, for 
every 7, c such that 'j > c, 



(3.23) 
(3.24) 



Pa(Ac >7-c) = l, 



Ea[(M^ 



Mi 



< c. 



Proof. If P = 0, then di = for every 1 <i < K, thus, from (2.25), 
(3.8) and the first inequality in (3.11) we obtain 

K K 

Then, from the definition of we have Px{A^ — — 7~c) = l, which 
proves (3.23). Finally, from (2.17), (3.19) and (3.23) we obtain 

Ea[(M5, - M^^f] = E,[As^ - A^^] = Ea[7 - A^^] < c, 
which completes the proof. □ 

Theorem 3.3. IfV = 0, then ^{X^^ - X) ^ M{0,1) as c. A, 7 -^00 
so that c, A = 0(^/7) . 

Proof. Since -^(A^^ — A) ~ A/'(0, 1) for every 7 > 0, it suffices to show 
that ^/y\X^ — Xs^\ converges to zero in probability as 7,c, A — )• 00 so that 
c, A = o{y/j). Indeed, from (2.13) and (3.20) we have PA-a.s. 



A 



5. - 



B-B 



+ A 



A- A 
A 



Since j4c > 7 — c and from (3.12) we have |-B — i?| c < A and {A — A)a < c. 



(3.25) V7|A. - A5J < Vl^^^ + ^/7 



7 — c 



The first term in the right-hand side of (3.25) converges to as c, A,7 — )• 00 
so that c, A = 0(^/7). Moreover, since As^ = 7 and A^ > 7 — c, 



a). 



M 



Ms, 



Ms-, 



7 
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(3.26) 



< 



< 



1 

1 



IM5 



IM5 



,7 -As. 



A. 



+ \M.-Ms^ 



7 — c 



From the Cauchy-Schwarz inequality, (3.17) and (3.24) we have 



Ea[|M, |]<./EA[M?]<y^, 



Ea[|M, - Ms,\] < JEa[(M, - Mc )2] < ^c. 



Then, taking expectations in (3.26), we obtain 



< 



c / c 

— + 

7 — c Y 7 



Therefore, the second term in the right-hand side of (3.25) converges to 
in probabihty, due to Markov's inequality, as c, A,7— )• 00 so that c = 0(7). 
This concludes the proof. □ 



Corollary 3.1. IfT> = 0, then (5^,A^ ) is asymptotically optimal as 
7, c, A — )• 00 so that c, A = 0(^/7) . 

Proof. This is a consequence of (3.18) and Theorem 3.3. □ 

3.3. Remarks and examples. For the implementation of the proposed es- 
timator, the fusion center does not need to record the values of the communi- 
cation times. It simply needs to keep track of i?^, . . . , and — if necessary — 
. . , A^ , and update them whenever it receives a relevant message. Since 
these statistics are defined recursively, at most 2K values need to be stored 
at any given time. 

Theorems 3.1, 3.2 and 3.3 remain valid if c and A are held fixed as t — ?• 00 
or 7 — )• 00. Moreover, they remain valid if we use in the definitions of Tn^ 
and Tn^ time- varying, positive thresholds, AJ^, A^, d^, so that 



C <C 



Therefore, it may be possible to improve the performance of the proposed 
estimator by introducing linear or curved boundaries and optimizing over 
the additional parameters. 

We close this section with some examples that illustrate our main results. 
Thus, let at ■= [c^-'] be an {J^t}-adapted, square matrix of size K, set at := 
ata't, where a't is the transpose of at, and consider the following special case 



PARAMETER ESTIMATION UNDER COMMUNICATION CONSTRAINTS 17 
of model (2.8): 

(3.27) Yi = \Y, Xio^ + / dWi, t> 0,1 <i<K, 

where {W^, . . . , W^) is a X-dimensional P;^-Brownian motion. The observed 
Fisher information {At} then becomes 



K K 



(3.28) At = J2J2 f ^sXial^ds, t > 0. 

i=l j=i -^0 

In Theorem 3.1, we stated the asymptotic properties of the proposed 
estimator when At is deterministic. This assumption is clearly satisfied when 
there are real functions 6j, pjj : [0, oo) — )■ M so that XI = hi{t) and at — Piji^) 
for every 1 < K, in which case 

K K t 

(3.29) At = ^^ / 6i(s)6j(s)/3,j(s)ds, i > 0, 

1=1 j=i -^0 

and (y^, . . . ,Y^) is a Gaussian process with independent increments. How- 
ever, Theorem 3.1 also applies when XI = bi{t)/Yt and = pij{t)YtY^ , in 
which case A is still given by (3.29). 

In Theorem 3.3, we proved that the proposed estimator is asymptotically 
optimal when A^^ is deterministic for every ij^j. This condition is clearly 
satisfied when o"*-' = for every i j, in which case Y^,. . . , Y^ are indepen- 
dent, a'' = (o-")2 and (3.27), (3.28) become 

y/ = A f Xto^: ds+ f ^fc^: dWl t > 0, 

K 







At = J2 f (Xifa^ds, t>0. 



If, in particular, X^ is a nonzero constant and a" = Y^, then y* is a square- 
root diffusion, whereas if X^ = y* and a** is a positive constant, then y* is 
an Ornstein-Uhlenbeck process. 

4. The Brownian case. In this section we assume that {Y^,Y^)t = 0, 
(y*,y*)t = t and XI = Xi, where rcj 7^ is a known constant, for every 1 < 
i^j<K and t > 0. 

Thus, Bj = XiYt\ Aj = {xi)H, At = Ylf=i A] and (2.8) reduces to 

Y; = Xxit + Ni, t>0,i = l,...,K, 
where A'^^, . . . , are independent, standard Brownian motions under P;^. 
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Since the filtrations {J't}, • ■ • , {^t^} are independent, for every 1 <i < K 
and t > we have 



(4,1) 



dPr 



A = J2f=i A* and 




We also assume, for simphcity, that A* = A* = A* for every 1 < i < K , thus, 
(4.2) 

(4.3) 

We denote by 5^ the time between the arrival of the (n — l)th and the nth 
message from sensor i and by ml the number of transmitted messages by 
sensor i up to time t, that is, 

(4.4) 5;:=T^S-T^^, mj:=max{nGN:<<t}. 

Since {At} is deterministic, Tn^ = oo for every 1 <i < K and n € N and the 
fusion center filtration becomes 

^t = cTi6i,zi;n<mll<i<K), t>0. 

Moreover, A := A and A^ := A^ for every i, however, we now define the 
following {^i}-adapted statistics: 

ml K 

(4.5) ij:=|xi|2^5i, At:=Y,^t, t>0. 

j=i 1=1 

That is. At is an approximation of At that relies only on the communication 
times {Tn^;n < ml, 1 <i < K}. 

Since Brownian motion "restarts" at stopping times, each (5^,z^)nGN is a 
sequence of i.i.d. pairs, thus, each (mJ)t>o is a renewal process. Moreover, it 
is possible to obtain a series representation for the joint density of the pair 
{S\,z\) under Pa, 

Px{5\edt,z\ = l) _ Px{S\€dt,zi = 0) 

Pi{L,A).— , pi{L,A).— 

This representation is the content of the following lemma, for which we need 
to define the following functions: 

oo 

g{t-x):= V h{t;{An+l)x), /i(t; x) := ^^e'^'/^t^ t,x>0. 
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Lemma 4.1. For every 1 <i< K and t>0, 

Pi(t;A) = e-^^'-0-^(^^')'*5(i;AV|x.|). 

Proof. From (4.2) and (4.4) we have 

(4.6) (5i =mf{t>0:|y/| > AV|xi|}, n G N. 

Since is a standard Brownian motion under Pq, it is well known (see, 
e.g., [11], page 99) thai pi{t;0) = pi{t;0) = g{t; A"^ /\xi\). Then, changing the 
measure Pa ^ Pq (similarly, e.g., to [11], page 196), we obtain the desired 
result. □ 



The following lemma describes some properties of the communication 
scheme that remain valid in the case of discrete sampling at the sensors, 
which we treat in Section 4.2. In order to lighten the notation, we denote 
by 0(A*) a term that when divided by A* is asymptotically bounded from 
above and below as A* — )• oo. 



Lemma 4.2. (a) For any t, A* > 0, 



(4.7) 



(4.8) 



■mj+l 



i\2l 



< 



< 



Ea[(^1) 

EA[5i] 

EA[(^i)^] 
EA[<5i] 



(b) As t,A'^oo, 

(4.9) E,[51] = e(A^), V,[5i] = e(A^), 

(4.10) o<EA[ylj-ij]<e(A'), 

(4.11) Ex[ml]<t/Q{A') + 1/Q{A'). 



Proof, (a) Since (5^) 

is a sequence of i.i.d. random variables, (4.7) 
follows from Theorem 1 in Lorden [14] and (4.8) from Lorden [14], page 526. 

(b) Recall from (4.6) that 5\ is the first time a Brownian motion with 
drift Xxi exits the symmetric interval {—A^/\xi\,A^/\xi\). Then, as A' — t- oo, 
from Wald's identity we have 

(4.12) E,[5i] = ^^(l + o(l)), 



20 G. FELLOURIS 

whereas from Martinsek [16] we have 
(4.13) 



Va[51] = ^J^(1 + o(1)). 



\Xxi\^ 

Then, from (4.12) and (4.13) we obtain (4.9), whereas from (4.5), (4.8) and 
(4.9) we obtain (4.10). 

Finahy, since + 1 is a stopping time with respect to the fihration 
generated by the pairs (61^, zlj^)n£N, from Wald's identity and (4.7) we have 



■mj+l ■ 

. i=i . 



i\2] 



EaK + 1]E;,K] = 

and, consequently, 

From this inequahty and (4.9) we obtain (4.11), which completes the proof. □ 

4.1. Likelihood-based estimation at the fusion center. Let Ct{\) and it{X) 
be the likehhood and the log-likehhood function of A that correspond to J^j, 
the accumulated information at the fusion center up to time t. The following 
proposition describes the structure of the corresponding score function. 



Proposition 4.1. For any t>0, 
■ K 



(4.14) 



d^t(A) 
dA 



XAt}+{Bt-XAt}. 



Proof. Suppose that ml = rrii, that is, sensor i has transmitted rrii 
messages to the fusion center up to time t, where rui is some nonnegative 
integer. Then, since all pairs {{z^,6l^),n G N, 1 < z < K} are independent, 
the fusion likelihood function has the following form: 

K /rrii \ l{™i>0} 

A(A) := n Pximi = m,) n M^n^ >^f" ' pA^I >^)'~'" 



i=l 



\n=l 



Due to Lemma 4.1, the corresponding log-likelihood function becomes 

K 

UX) = Y.logP x{ml = mi) 



i=l 



K 



rrii 



^^{mi>0}Y^ 



i=l 



n=l 



XA' 



[XXiYdl 



+ \ogg{5l,Ay\xi 
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K TTH r N2rj 



(1-4)- 



i=l n=l 

Then, recalling the definition of B in (3.6)-(3.7) and of A in (4.5), 

Since {mj = mj} S J-"^, changing the measure i— ?• Pqi we have 

and, consequently, 

d . p , . Eo[e^^^-^^-^^/^(i3--A4)l^^.,^,^] 
-(log P.(m, = m.)) = P,i^l = m,) 

Ea[S|1{„j=^j] - AAjPA(mj = mi) 



Px{ml = mi) 
= Ex[Bl\ml = m,]-XAl 

which implies (4.14). □ 

Note that the second term in (4.14) reflects the information from the 
communication times and the transmitted messages, whereas the first term 
reflects the information between transmissions. 

At time t, the fusion center should ideally estimate A with the fusion 
center MLE, that is, the root of the score function (4.14). However, since 
EA[i?j|mJ] does not admit a simple, closed- form expression as a function 
of A, we can only approximate this conditional expectation and obtain an 
approximate fusion center MLE. 

If we replace each E^ [Bl \ m}] with the corresponding unconditional expec- 
tation, EA[i?(] = XAl, the first term in (4.14) vanishes and we obtain the 
following estimator: 

(4.15) Xt-=^, t> min r|. 

At ^<i<K 

On the other hand, if we approximate EA[-B^|?^t] with XAl, we recover the 
estimator {At} that was defined in (3.13) and whose asymptotic properties 
were established in Theorem 3.1. In the following proposition we show that, 
in the special Brownian case that we consider in this section, Xt has similar 
asymptotic behavior as Xt- 

Proposition 4.2. //t,A— )-oo so that A = o{t), then Xt converges to X 
in probability. If additionally A = o{\/t), then y/At{\t — A) — )• AA(0, 1), that 
is, Xt is an asymptotically optimal estimator of A. 
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Proof. From the definition of in (3.13) and Xt in (4.15) we have 

r 7 Bt Bt At At- At- 
(4.16) X^-X^ = ^--l = ^^ — lA*, t>0. 

^ ' ' ' At At At At 

From (4.10) it fohows that 

t t .^-^ .^-^ t t 

Therefore, Markov's inequality implies that [At — At)/At converges to and 
At /At converges to 1 in probability as t,A— >-oo so that A = o(t), since 
At is a linear function of t. Moreover, from Theorem 3.1 we know that At 
converges to A in probability if A = o(t). Thus, we conclude that At also 
converges to A in probability as t, A — t- oo so that A = o{t). 

In order to prove that Aj is asymptotically optimal, it suffices to show that 
-v^47| At — At I converges to in probability as t, A — )• oo so that A = o{\/t), 
which also follows from (4.16) and (4.17). □ 

4.2. The case of discrete sampling. We now assume that each sensor 
observes its underlying process only at a sequence of discrete and equidistant 
times {nh, n G N}, where > is a common sampling period. Thus, in what 
follows, t = h, 2h, .... The goal is to examine the effect of discrete sampling 
on the proposed estimating scheme. 

First of all, we observe that the centralized estimator, 

(4.18) At-^*- ^^=1"^^^*' 



At 



is not affected by the discrete sampling of the underlying processes and 
(2.18) remains valid, that is, \/^{Xt — A) ~ AA(0, 1) for every t = h, 2h, .... 

Moreover, the pairs (JJ^, 2;^)neN remain i.i.d. and Lemma 4.2 still holds. 
On the other hand. Lemma 4.1 is no longer valid and there is not an explicit 
formula for the density of the pair {61, zl). However, the main difference in 
the case of discrete sampling is that at any time Tn^ the fusion center learns 
whether B^ increased or decreased by at least A* since r^^^^, but does not 
learn by how much exactly. In other words, the fusion center does not learn 
the size of the realized overshoots. 



(4.19) < := {B\,s - B\,s - AT + iB\,B - B\.s + A*)", n G N. 

As a result, the statistic B^, defined in (3.6), is no longer equal to -B* at 
the communication times (rn )nGN and the distance \Bt — Bt\ is no longer 
bounded by A*. Therefore, Theorem 3.1, which establishes the consistency 
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and asymptotic optimality of the proposed estimator, Xt = Bt/At, under the 
assumption of continuous-time sensor observations may not hold when the 
sensors observe their underlying processes at discrete times. 

Our goal is to determine under what conditions the consistency and 
asymptotic optimality of are preserved in the context of discrete sam- 
pling at the sensors. In order to do so, we need to estimate the inflicted 
performance loss due to the unobserved overshoots. The following lemma is 
very useful in this direction. 

Lemma 4.3. For every 1 <i < K , 
(4.20) \Bl-Bl\<^' + Y,rf,, t>0, 

and the overshoots (?/Ji)neN o'^e i.i.d. with 



(4.21) sup Ea[771] = 0(V/i). 

A»>0 

Proof. For every t > we have 

ml 

Bt - Bf- - B^ - B^.^B + 2_^{B^^,B - B^.^B ) - B 



Bi - Bi^,B + YlliK-- - ) - [^^4 - ^^(1 - 4)]]' 



which implies (4.20). It is obvious that the overshoots (??^)neN are i.i.d. In 
order to prove (4.21), we write 5\ = min{5*^, 5*]^}, where 

£ := inf{n/i : Bl,^ < -A*}, ~5\ := inf{n/i : B^,^ > A'}. 

Then, from Theorem 3 of Lorden [14] it follows that for any r > 1, 



(4.22) 



sup Ea[7?1] < max{EA[S|. - A*], -Ea[5^. + A*]} 
A»>0 ^ 



lr + 2Ex[\Bl\r+^ 



- yr + l |EA[i?;,J| 
Since ^ M{\xih,h) under Pa and B\^ = XiY^, 
Ex[Bl] = X{xifh, 
EA[(Si)1 = {x,)^[{Xxih)^ + 6{Xxihfh + 3/i2] 
= 3{xi)^h'^{l + o{l)) as/i^O. 
Setting r = 3 in (4.22) completes the proof. □ 
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In the following theorem we show that remains consistent as t — t- oo for 
any given, fixed sampling period, h> 0, as long as the communication rate 
of every sensor is asymptotically low. 

Theorem 4.1. //t. A* — > co so that A* = o(i) for every l<i< K, then 
EA[|At-A|]^0. 

Proof. Since Ex[\Xt - A|] — ;> 0, it suffices to show that EA[|At - At|] 0. 
Indeed, from the definition of the two estimators and (4.20) we have 

(4-23) I A, - A.I < - -Bl\<- + -^j:rj^. 

i=l i=l j=l 

Since m\-\-l is a stopping time with respect to the filtration generated by 
(5^, z^, r/Jj).„gN5 from Wald's identity we obtain 



(4.24) 



■mj+l • 



E,[77i]EA[mj + l]. 



Taking expectations in (4.23) and applying (4.24), we obtain 
,4.25, E,OA,-A.|l£| + f:^iMl£M±i. 

i=l 

Then, from (4.11), (4.21) and the fact that At is a linear function of t we 
have 

(4.26) E.||A,-A,|l<5^ + f:.'^>"il 



If some A* is fixed as t — ?• oo, the second term in the right-hand side of 
(4.26) does not go to (unless /i — )• 0, in which case E^fr?^] — )• for every 
1 < i < J^, due to (4.21)). However, if A* — t- oo so that A* = o{t) for every 
1 < i < A', then both terms in the right-hand side of (4.26) go to for any 
given sampling period, h> 0, which completes the proof. □ 



The proof of Theorem 4.1 suggests that the proposed estimator is not 
consistent when both {A*, 1 <i < K} and h are held fixed. In other words, 
it is necessary to have either a high sampling rate {h — )■ 0) in order to reduce 
the size of the unobserved overshoots or a low communication rate in all 
sensors (A* — )• oo VI < z < K) in order to reduce their accumulation rate. 

However, an asymptotically low communication rate is not sufficient in 
order to preserve the asymptotic optimality of At in the case of discrete 
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sampling at the sensors. For this, the samphng period h must converge to 
at an appropriate rate relative to the communication rate and the horizon 
of observations, which we specify in the following theorem. 

Theorem 4.2. // 1, A* — )• oo and h^O so that 

A' = o{Vt) and \/h = o{A'/Vi) yi<i<K, 

then y/A't{\t — A) — )• AA(0, 1), that is, Xt is an asymptotically optimal estima- 
tor. 

Proof. Since ^/A't{\t - A) ~7V(0, 1), it suffices to show that ^/A't\\t - 
At I converges to in probability. Indeed, from (4.26) and the fact that At is 
a linear function of t, 

(4.27) v^E,[|A,-A.|]<^ + ^ 



^e(AV^/t)' 

The first term in the right-hand side goes to if A = o{^/t). The second term 
goes to if EA[r?i] = o(A''/\/t) for every l<i <K. For the latter, it suffices 
that v^ = o(AVVt) for every 1 < « < i^, due to (4.21), which completes the 
proof. □ 

Remark. If each A* is fixed as t — t- oo, then Theorem 4.2 implies that 
Xt is asymptotically efficient as t — t- oo and /i — t- so that ^/h^/i — t- 0. 

5. Conclusions. In this work we considered a parameter estimation prob- 
lem assuming that the statistician collects data from dispersed sensors, which 
observe continuous (possibly correlated) semimartingales with linear drifts 
with respect to a common, unknown parameter. Motivated by sensor net- 
work applications, which are typically characterized by limited communica- 
tion bandwidth, we required that the sensors must send a small number of 
bits per transmission and that they should avoid a high rate of communica- 
tion with the fusion center. 

We proposed a novel methodology for this problem, according to which 
the sensors transmit to the fusion center one-bit messages at first exit times 
of appropriate statistics that they observe locally. The fusion center then 
combines these messages and constructs an estimator that imitates the op- 
timal centralized estimator (which can be computed only if there is full 
access to the sensor observations). 

We proved that the resulting estimator is consistent and, for a large class 
of processes, asymptotically optimal, in the sense that it attains the perfor- 
mance of the optimal centralized estimator when a sufficiently large horizon 
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of observations is available. However, it is much more efficient from a prac- 
tical point of view, as it reduces dramatically the congestion in the network 
and the computational burden at the fusion center. This is the case because 
it requires the transmission of only one-bit messages from the sensors and 
its statistical properties are preserved even with an asymptotically low rate 
of communication. 

It remains an open problem to design estimators with analogous opti- 
mality properties in more complicated setups, such as when there is not 
an explicit form for the optimal centralized estimator, the dimensionality 
of the parameter space is large or the sensors take non-i.i.d., discrete-time 
observations. 

Acknowledgments. The author would like to thank Dr. George V. Mous- 
takides and Dr. Alexandra Chronopoulou for their feedback. Moreover, the 
author is grateful to the two anonymous referees and the Associate Editor for 
their valuable remarks and suggestions that led to a significant improvement 
of earlier versions of this work. 

REFERENCES 

[1] Blum, R. S., Kassam, S. A. and Poor, H. V. (1997). Distributed detection with 
multiple sensors: Part ll-advanced topics. Proc. IEEE 85 64-79. 

[2] Brown, B. M. and Hewitt, J. I. (1975). Asymptotic likelihood theory for diffusion 
processes. J. Appl. Probab. 12 228-238. MR0375693 

[3] Brown, B. M. and Hewitt, J. I. (1975). Inference for the diffusion branching- 
process. J. Appl. Probab. 12 588-594. MR0378307 

[4] Feigin, p. D. (1976). Maximum likelihood estimation for continuous-time stochastic 
processes. Adv. m Appl. Probab. 8 712-736. MR0426342 

[5] Fellouris, G. and Moustakides, G. V. (2011). Decentrahzed sequential hypothe- 
sis testing using asynchronous communication. IEEE Trans. Inform. Theory 57 
534-548. MR2814070 

[6] FORESTi, G. L., Regazzoni, C. S. and Varshney, P. K., eds. (2003). MulUsensor 

Surveillance Systems: The Fusion Perspective. Kluwer Academic, Dordrecht. 
[7] Galtchouk, L. and Konev, V. (2001). On sequential estimation of parameters in 

semimartingale regression models with continuous time parameter. Ann. Statist. 

29 1508-1536. MR1873340 
[8] Grenander, U. (1950). Stochastic processes and statistical inference. Ark. Mat. 1 

195-277. MR0039202 

[9] Han, T. S. and Amari, S. (1995). Parameter estimation with multiterminal data 

compression. IEEE Trans. Inform. Theory 41 1802-1833. 
[10] Han, T. S. and Amari, S. (1998). Statistical inference under multiterminal data 

compression. IEEE Trans. Inform. Theory 44 2300-2324. MR1658791 
[11] Karatzas, I. and Shreve, S. E. (1991). Brownian Motion and Stochastic Calculus^ 

2nd ed. Graduate Texts m Mathematics 113. Springer, New York. MR1121940 
[12] KuTOYANTS, Y. A. (2004). Statistical Inference for Ergodic Diffusion Processes. 

Springer, London. MR2144185 



PARAMETER ESTIMATION UNDER COMMUNICATION CONSTRAINTS 27 



[13] LiPTSER, R. S. and Shiryaev, A. N. (2001). Statistics of Random Processes: Ap- 
plications, 2nd ed. Applications of Mathematics (New York) 6. Springer, Berlin. 
MR1800858 

[14] LORDEN, G. (1970). On excess over the boundary. Ann. Math. Statist. 41 520-527. 
MR0254981 

[15] Luo, Z.-Q. (2005). Universal decentralized estimation in a bandwidth constrained 
sensor network. IEEE Trans. Inform. Theory 51 2210-2219. MR2235295 

[16] Martinsek, a. T. (1981). A note on the variance and higher central moments of the 
stopping time of an SPRT. J. Amer. Statist. Assoc. 76 701-703. MR0629754 

[17] Mel'nikov, a. V. and Novikov, A. A. (1988). Sequential inferences with guar- 
anteed accuracy for semimartingales. Teor. Veroyatn. Primen. 33 480-494. 
MR0968395 

[18] Novikov, A. A. (1972). Sequential estimation of the parameters of processes of 

diffusion type. Mat. Zametki 12 627-638. MR0317493 
[19] Prakasa Rao, B. L. S. (1985). Statistical Inference for Diffusion Type Processes. 

Arnold, London. 

[20] Rabi, M., Moustakides, G. V. and Baras, J. S. (2012). Adaptive sampling for 
linear state estimation. SIAM J. Control Optim. 50 672-702. MR2914225 

[21] Revuz, D. and YOR, M. (1999). Continuous Martingales and Brownian Motion, 3rd 
ed. Crundlehren der Mathematischen Wissenschaften [Fundamental Principles 
of Mathematical Sciences] 293. Springer, Berlin. MR1725357 

[22] Striebel, C. T. (1959). Densities for stochastic processes. Ann. Math. Statist. 30 
559-567. MRO 104330 

[23] Veeravalli, V. V. (1999). Sequential decision fusion: Theory and applications. 

J. Franklin Inst. 336 301-322. MR1674584 
[24] ViSWANATHAN, R. and Varshney, R. K. (1997). Distributed detection with multiple 

sensors: Part Il-fundamentals. Proc. IEEE 85 54-63. 
[25] Xiao, J. -J. and Luo, Z.-Q. (2005). Decentralized estimation in an inhomogeneous 

sensing environment. IEEE Trans. Inform. Theory 51 3564-3575. 

Department of Mathematics 
University of Southern California 
3620 South Vermont Ave. 
KAP 416 

Los Angeles, California 90089-2532 
USA 

E-MAIL: fcllouri@usc.edu 



